智能论文笔记

Multi-Agent Sequential Decision-Making via Communication

Ziluo Ding , Kefan Su , Weixin Hong , Liwen Zhu , Tiejun Huang , Zongqing Lu

分类：机器学习

2022-09-26

沟通可以帮助代理商获得有关他人的信息，以便可以学习更好的协调行为。一些现有的工作会与其他人传达预测的未来轨迹，希望能为其他人做些更好的协调能力提供线索。但是，当对代理人同步处理时，有时会发生循环依赖性，因此很难协调决策。在本文中，我们提出了一种新颖的交流方案，顺序通信（SEQCOMM）。 Seqcomm不同步（高级代理在低级阶段之前做出决定），并有两个通信阶段。在谈判阶段，代理通过传达观测的隐藏状态并比较意图的价值来确定决策的优先级，这是通过对环境动态进行建模来获得的。在发射阶段，高级代理商领导着做出决策并与低级代理商进行交流。从理论上讲，我们证明Seqcomm学到的政策可以单调地改善并融合。从经验上讲，我们表明SEQCOMM在各种多机构合作任务中都优于现有方法。

translated by 谷歌翻译

How would Stance Detection Techniques Evolve after the Launch of ChatGPT?

Bowen Zhang , Daijun Ding , Liwen Jing

分类：自然语言处理

2022-12-30

Stance detection refers to the task of extracting the standpoint (Favor, Against or Neither) towards a target in given texts. Such research gains increasing attention with the proliferation of social media contents. The conventional framework of handling stance detection is converting it into text classification tasks. Deep learning models have already replaced rule-based models and traditional machine learning models in solving such problems. Current deep neural networks are facing two main challenges which are insufficient labeled data and information in social media posts and the unexplainable nature of deep learning models. A new pre-trained language model chatGPT was launched on Nov 30, 2022. For the stance detection tasks, our experiments show that ChatGPT can achieve SOTA or similar performance for commonly used datasets including SemEval-2016 and P-Stance. At the same time, ChatGPT can provide explanation for its own prediction, which is beyond the capability of any existing model. The explanations for the cases it cannot provide classification results are especially useful. ChatGPT has the potential to be the best AI model for stance detection tasks in NLP, or at least change the research paradigm of this field. ChatGPT also opens up the possibility of building explanatory AI for stance detection.

translated by 谷歌翻译

PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling

Guanting Dong , Daichi Guo , Liwen Wang , Xuefeng Li , Zechen Wang , Chen Zeng , Keqing He , Jinzheng Zhao , Hao Lei , Xinyue Cui

分类：自然语言处理

2022-08-24

大多数现有的插槽填充模型倾向于记住实体的固有模式和培训数据中相应的上下文。但是，这些模型在暴露于口语语言扰动或实践中的变化时会导致系统故障或不良输出。我们提出了一种扰动的语义结构意识转移方法，用于训练扰动插槽填充模型。具体而言，我们介绍了两种基于传销的培训策略，以分别从无监督的语言扰动语料库中分别学习上下文语义结构和单词分布。然后，我们将从上游训练过程学到的语义知识转移到原始样本中，并通过一致性处理过滤生成的数据。这些程序旨在增强老虎机填充模型的鲁棒性。实验结果表明，我们的方法始终优于先前的基本方法，并获得强有力的概括，同时阻止模型记住实体和环境的固有模式。

translated by 谷歌翻译

UC-OWOD: Unknown-Classified Open World Object Detection

Zhiheng Wu , Yue Lu , Xingyu Chen , Zhengxing Wu , Liwen Kang , Junzhi Yu

分类：计算机视觉

2022-07-23

开放世界对象检测（OWOD）是一个具有挑战性的计算机视觉问题，需要检测未知对象并逐渐学习已确定的未知类别。但是，它不能将未知实例区分为多个未知类。在这项工作中，我们提出了一个新颖的OWOD问题，称为未知分类的开放世界对象检测（UC-OWOD）。 UC-OWOD旨在检测未知实例并将其分类为不同的未知类别。此外，我们制定问题并设计一个两阶段的对象检测器来解决UC-OWOD。首先，使用未知的标签意见建议和未知歧视性分类头用于检测已知和未知对象。然后，构建基于相似性的未知分类和未知聚类改进模块，以区分多个未知类别。此外，设计了两个新颖的评估方案，以评估未知类别的检测。丰富的实验和可视化证明了该方法的有效性。代码可在https://github.com/johnwuzh/uc-owod上找到。

translated by 谷歌翻译

A Robust Visual Sampling Model Inspired by Receptive Field

Liwen Hu , Lei Ma , Dawei Weng , Tiejun Huang

分类：计算机视觉

2022-01-04

模仿Retina Fovea的尖峰相机可以通过点燃尖峰报告每个像素亮度强度累积。作为具有高颞分辨率的生物启发视觉传感器，它具有巨大的计算机视觉潜力。然而，当前尖峰相机中的采样模型非常容易受到量化和噪声的影响，即它无法有效地捕获物体的纹理细节。在这项工作中，提出了一种由接收场（RVSM）启发的强大的视觉采样模型，其中使用高斯（狗）和高斯滤波器的差异产生的小波滤波器来模拟接收领域。使用类似于逆小波变换的相应方法，来自RVSM的尖峰数据可以转换为图像。为了测试性能，我们还提出了一个高速运动尖峰数据集（HMD），包括各种运动场景。通过比较HMD中的重建图像，我们发现RVSM可以提高大大捕获钉相机信息的能力。更重要的是，由于模仿接受现场机制来收集区域信息，RVSM可以有效地过滤高强度噪声并提高尖峰相机在很大程度上对噪声敏感的问题。此外，由于采样结构的强概率，RVSM也适用于其他神经形态视觉传感器。上面的实验在钉相机模拟器中完成。

translated by 谷歌翻译

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion

Shruti Agarwal , Liwen Hu , Evonne Ng , Trevor Darrell , Hao Li , Anna Rohrbach

分类：计算机视觉 | 人工智能 | 自然语言处理

2021-12-21

在今天的数字错误信息的时代，我们越来越受到视频伪造技术构成的新威胁。这种伪造的范围从Deepfakes（例如，复杂的AI媒体合成方法）的经济饼（例如，精致的AI媒体合成方法）从真实视频中无法区分。为了解决这一挑战，我们提出了一种多模态语义法医法，可以发现超出视觉质量差异的线索，从而处理更简单的便宜赌注和视觉上有说服力的德国。在这项工作中，我们的目标是验证视频中看到的据称人士确实是通过检测他们的面部运动与他们所说的词语之间的异常对应。我们利用归因的想法，以了解特定于人的生物识别模式，将给定发言者与他人区分开来。我们使用可解释的行动单位（AUS）来捕捉一个人的面部和头部运动，而不是深入的CNN视觉功能，我们是第一个使用字样的面部运动分析。与现有的人特定的方法不同，我们的方法也有效地对抗专注于唇部操纵的攻击。我们进一步展示了我们的方法在培训中没有看到的一系列假装的效率，包括未经视频操纵的培训，这在事先工作中没有解决。

translated by 谷歌翻译

DIVeR: Real-time and Accurate Neural Radiance Fields with Deterministic Integration for Volume Rendering

Liwen Wu , Jae Yong Lee , Anand Bhattad , Yuxiong Wang , David Forsyth

分类：计算机视觉

2021-11-19

潜水员在NERF的关键思想和其变体 - 密度模型和体积渲染的关键思想中建立 - 学习可以从少量图像实际渲染的3D对象模型。与所有先前的NERF方法相比，潜水员使用确定性而不是体积渲染积分的随机估计。潜水员的表示是基于体素的功能领域。为了计算卷渲染积分，将光线分为间隔，每个体素;使用MLP的每个间隔的特征估计体渲染积分的组件，并且组件聚合。结果，潜水员可以呈现其他集成商错过的薄半透明结构。此外，潜水员的表示与其他这样的方法相比相对暴露的语义 - 在体素空间中的运动特征向量导致自然编辑。对当前最先进的方法的广泛定性和定量比较表明，潜水员产生（1）在最先进的质量或高于最先进的质量，（2）的情况下非常小而不会被烘烤，（3）在不被烘烤的情况下渲染非常快，并且（4）可以以自然方式编辑。

translated by 谷歌翻译

Optical Flow Estimation for Spiking Camera

Liwen Hu , Rui Zhao , Ziluo Ding , Lei Ma , Boxin Shi , Ruiqin Xiong , Tiejun Huang

分类：计算机视觉

2021-10-08

作为具有高时间分辨率的生物启发传感器，尖峰摄像机在真实应用中具有巨大的潜力，特别是在高速场景中的运动估计。然而，由于数据模式不同，基于帧的基于事件的方法并不适合从尖峰相机的尖峰流。为此，我们展示，Scflow，一种量身定制的深度学习管道，以估计来自尖峰流的高速场景中的光学流量。重要的是，引入了一种新的输入表示，其可以根据先前运动自适应地从尖峰流中自适应地移除运动模糊。此外，对于训练Scflow，我们为Spiking Camera的两组光学流量数据合成了两组光学流量数据，尖锐的东西和光处理的高速运动，分别表示为乘坐和PHM，对应于随机的高速和精心设计的场景。实验结果表明，SC流程可以预测不同高速场景中的尖峰流的光流。此外，Scflow显示了\真正的尖峰流的有希望的泛化。发布后，所有代码和构造数据集将发布。

translated by 谷歌翻译

Cluster-guided Contrastive Graph Clustering Network

Xihong Yang , Yue Liu , Sihang Zhou , Siwei Wang , Wenxuan Tu , Qun Zheng , Xinwang Liu , Liming Fang , En Zhu

分类：机器学习

2023-01-03

Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.

translated by 谷歌翻译

High-Quality Supersampling via Mask-reinforced Deep Learning for Real-time Rendering

Hongliang Yuan , Boyu Zhang , Mingyan Zhu , Ligang Liu , Jue Wang

分类：计算机视觉

2023-01-03

To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.

translated by 谷歌翻译